This is an interactive notebook. You can run it locally or use the links below:
🔑 Prerequisites
Before you can run a Weave evaluation, complete the following prerequisites.- Install the W&B Weave SDK and log in with your API key.
- Install the OpenAI SDK and log in with your API key.
- Initialize your W&B project.
🐝 Run your first evaluation
The following code sample shows how to evaluate an LLM using Weave’sModel
and Evaluation
APIs. First, define a Weave model by subclassing weave.Model
, specifying the model name and prompt format, and tracking a predict
method with @weave.op
. The predict
method sends a prompt to OpenAI and parses the response into a structured output using a Pydantic schema (FruitExtract
). Then, create a small evaluation dataset consisting of input sentences and expected targets. Next, define a custom scoring function (also tracked using @weave.op
) that compares the model’s output to the target label. Finally, wrap everything in a weave.Evaluation
, specifying your dataset and scorers, and call evaluate()
to run the evaluation pipeline asynchronously.
🚀 Looking for more examples?
- Learn how to build an evlauation pipeline end-to-end.
- Learn how to evaluate a RAG application by building.